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Abstract. We show how to underapproximate the procedure summaries of re- 
cursive programs over the integers using off-the-shelf analyzers for non-recursive 
programs. The novelty of our approach is that the non-recursive program we com- 
pute may capture unboundedly many behaviors of the original recursive program 
for which stack usage cannot be bounded. Moreover, we identify a class of recur- 
sive programs on which our method terminates and returns the precise summary 
relations without underapproximation. Doing so, we generalize a similar result 
for non-recursive programs to the recursive case. Finally, we present experimen- 
tal results of an implementation of our method applied on a number of examples. 



1 Introduction 

Procedure summaries are relations between the input and return values of a procedure, 
resulting from its terminating executions. Computing summaries is important, as they 
are a key enabler for the development of modular verification techniques for inter- 
procedural programs, such as checking safety, termination or equivalence properties. 
Summary computation is, however, challenging in the presence of recursive procedures 
with integer parameters, return values, and local variables. While many analysis tools 
exist for non-recursive programs, only a few ones address the problem of recursion. 

In this paper, we propose a novel technique to generate arbitrarily precise underap- 
proximations of summary relations. Our technique is based on the following idea. The 
control flow of procedural programs is captured precisely by the language of a context- 
free grammar. A £-index underapproximation of this language (where k ^ 1) is obtained 
by filtering out those derivations of the grammar that exceed a budget, called index, on 
the number (at most k) of occurrences of non-terminals occurring at each derivation 
step. As expected, the higher the index, the more complete the coverage of the under- 
approximation. From there we define the £-index summary relations of a program by 
considering the fc-index underapproximation of its control flow. 

Our method then reduces the computation of £-index summary relations for a re- 
cursive program to the computation of summary relations for a non-recursive program, 
which is, in general, easier to compute because of the absence of recursion. The reduc- 
tion was inspired by a decidability proof |4] in the context of Petri nets. 

The contributions of this paper are threefold. First, we show that, for a given index, 
recursive programs can be analyzed using off-the-shelf analyzers designed for non- 
recursive programs. Second, we identify a class of recursive programs, with possibly 



unbounded stack usage, on which our technique is complete i.e., it terminates and re- 
turns the precise result. Third, we present experimental results of an implementation of 
our method applied on a number of examples. 

Related Work The problem of analyzing recursive programs handling integers (in gen- 
eral, unbounded data domains) has gained significant interest with the seminal work 
of Sharir and Pnueli |24| . They proposed two approaches for interprocedural dataflow 
analysis. The first one keeps precise values (call strings) up to a limited depth of the re- 
cursion stack. In contrast to the methods based on the call strings approach, our method 
can also analyse precisely certain programs for which the stack is unbounded. 

The second approach of Sharir and Pnueli is based on computing the least fixed 
point of a system of recursive dataflow equations (the functional approach). This ap- 
proach to interprocedural analysis is based on computing an increasing Kleene sequence 
of summaries for control paths in the program of increasing, but bounded length. Re- 
cently ifTTI . the Newton sequence was shown to converge at least as fast as the Kleene 
sequence. The intuition behind the Newton sequence is to consider control paths in the 
program of increasing index, and unbounded length. Our contribution can be seen as 
a technique to compute the iterates of the Newton sequence for programs with integer 
parameters, return values, and local variables. 

The complexity of the functional approach was shown to be polynomial in the size 
of the (finite) abstract domain, in the work of Reps, Horwitz and Sagiv [23 |. This re- 
sult is achieved by computing summary information, in order to reuse previously com- 
puted information during the analysis. Following up on this line of work, most existing 
abstract analyzers, such as InterProc |fl9ll , also use relational domains to compute 
overapproximations of function summaries - typically widening operators are used to 
ensure termination of fixed point computations. The main difference of our method with 
respect to static analyses is the use of underapproximation instead of overapproxima- 
tion. If the final purpose of the analysis is program verification, our method will not 
return false positives. Moreover, the coverage can be increased by increasing the bound 
on the derivation index. 

Previous works have applied model checking based on abstraction refinement to re- 
cursive programs. One such method, known as nested interpolants represents programs 
as nested word automata 1 3 1, which have the same expressive power as the visibly push- 
down grammars used in our paper. Also based on interpolation is the Whale algorithm 
0, which combines partial exploration of the execution paths (underapproximation) 
with the overapproximation provided by a predicate-based abstract post operator, in or- 
der to compute summaries that are sufficient to prove a given safety property. Another 
technique, similar to Whale, although not handling recursion, is the Smash algorithm 
|fT31l which combines may- and must-summaries for compositional verification of safety 
properties. These approaches are, however, different in spirit from ours, as their goal is 
proving given safety properties of programs, as opposed to computing the summaries 
of procedures independently of their calling context, which is our case. We argue that 
summary computation can be applied beyond safety checking, e.g., to prove termination 
0, or program equivalence. 
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2 Preliminaries 



Grammars A context-free grammar (or simply grammar) is a tuple G = (X, E, 8) where 
X is a finite nonempty set of nonterminals, E is a finite nonempty alphabet and 8 c 
I x (Eul)* is a finite set of productions. The production (X,w) may also be noted 
X — » w. Also define headiX —* w) = X and tailiX — > w) = w. Given two strings u,v e 
(Eul)* we define a step u =^> v if there exists a production {X , w) e 8 and some words 
y,z e (E u X)* such that u = yXz and v = ywz. We use =^>* to denote the reflexive 
transitive closure of ==>. The language of G produced by a nonterminal X e X is the set 
Lx(G) = {w el,* \X =^>* w} and we call any sequence of steps from a nonterminal 
X to w e E* a derivation from X. Given X =^>* w, we call the sequence y e 8* of 

Y 

productions used in the derivation a control word and write X =^> w to denote that the 
derivation conforms to y. 

Visibly Pushdown Grammars To model the control flow of procedural programs we 
use languages generated by visibly pushdown grammars, a subset of context-free gram- 
mars. In this setting, words are defined over a tagged alphabet E = E u (E u E), where 
(E = {(a | a e E} intuitively represents procedure call site and E) = {a) | a e E} repre- 
sents procedure return site. Formally, a visibly pushdown grammar G = (X,E,8) is a 
grammar that has only productions of the following forms, for some a,beL: 

X^a X^aY X^(aYb)Z 

It is worth pointing that, for our purposes, we do not need a visibly pushdown grammar 
to generate the empty string e. Each tagged word generated by visibly pushdown gram- 
mars is associated a nested word [3| the definition of which we briefly recall. Given a 
finite alphabet E, a nested word over E is a pair (w, "-»), where w = a \ai . . . a„ e E* , and 
^> c {1,2, ... ,n} x { 1 , 2, . . . , n} is a set of nesting edges (or simply edges) where: 

1. j only if i < j, i.e. edges only go forward; 

2. \{j i j}\ sg 1 and \{i | / ^ j}\ sg 1, i.e. no two edges share a position; 

3. if i ~+ j and k ~> £ then it is not the case that / < k sg j < £ i.e. edges do not cross. 
Intuitively, we associate a nested word to a tagged word as follows: there is an edge be- 
tween tagged symbols (a and b} iff both are generated at the same derivation step. For 
instance looking forward at Ex.|2]consider the tagged word w = TiT2(t3'Ci'C5'C6' c 7' c 3)'C4 
resulting from a derivation Q'"" =^>* w. The nested word associated to w is 
(T1T2T3X1X5X6T7T3T4, {3 ~> 8}). Finally, let wjnw denote the mapping which given a 
tagged word in the language of a visibly pushdown grammar returns the nested word 
thereof. 

Integer Relations We denote by Z the set of integers. Let x = {jci ,X2, . . . ,xd} be a set of 
variables for some d > 0. Define x' the primed variables of x to be {x[ ,xL, . . . ,x' d }. All 
variables range over Z. We denote by ~f an ordered sequence (yi, . . . ,yk) of variables, 
and by |~y* its length k. By writing fcxwe mean that each variable in "y* belongs to 
x. For sequences "y* and ~z* of length k, let ~f = ~~i stand for the equality /\ k j=1 yi = Zi- 

A linear term t is a linear combination of the formao + Xif=i a ' x '> where ao,a\, ...,a c i e 
Z. An atomic proposition is a predicate of the form t ^ 0, where t is a linear term. We 
consider formulae in the first-order logic over atomic propositions t ^ 0, also known as 
Presburger arithmetic. 
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A valuation of x is a function v : x — > Z. The set of all valuations of x is denoted 
by Z x . If ~y* = <j>i, . . . ,y,t) is an ordered sequence of variables, we denote by v(~f) 
the sequence of integers (v(yi), . . . ,v(y£)). An arithmetic formula ^_(x,y') defining a 
respect to two valuations Vi e Z x and V2 £ Z y , by replacing each x e x by Vi (x) and each 
y' e y' by V2(y) in The composition of two relations R\ Q Z x x Z y and P2 ^ Z y x Z z 
is denoted by R x oR 1 = {( u ,v) e Z x x Z z | 3t e Z y . (u,t) e /?! and (t,v) e P 2 }- For a 
subset y <= x, we denote vj. y e Z y the projection of v onto variables ycx. 

3 Integer Recursive Programs 

We consider in the following that programs are collections of procedures that call each 
other, possibly according to recursive schemes. Formally, an integer program is an in- 
dexed tuple T = (P\ ,...,P„), where P\, . . . ,P n of procedures. Eachprocedure is a tuple 
Pi = (x i ,^ i l',^ ut ,Si,qf lit ,Fi,Ai), where x,- are the local variables^ of P, fcnxj = 
for all i # J), It]", ~x°'" x, are the ordered tuples of input and output variables, 5, are 
the control states of Pj (Sj n Sj = 0, for all i ^ J), q" 1 " £ Si is the initial, and Fi Q 5, are 
the final states of Pj, and A, is a set of transitions of one of the following forms: 

- q ^ " q' is an internal transition, where q,q' e S,, and ^(x,,xj) is aPresburger 
arithmetic relation involving only the local variables of P, 

~z > '=/ > i(Ti') . 

- q > q is a call, where q,q e Sj, Pj is the callee, u are linear terms over 

x;, ~z* c x,' are variables, such that If | = |"x*"' and l^l = |^"" r |- 
The call graph of a program (P = (P\ ,P„) is a directed graph with vertices P\,...,P n 
and edges (Pi,Pj) if and only if P, has a call to Pj. A program is said to be recursive if 
its call graph has at least one cycle, and non-recursive if its call graph is a dag. Finally, 
let nf (Pi) denotes the set 5,\P, of non-final state of P t . Also n c f{ ( £)= (J" = x (Si\Fi). 

Simplified syntax To ease the description of programs defined in this paper, we use a 
simplified, human readable, imperative language such that each procedure of the pro- 
gram conforms to the following grammar^ 

P ::= proc Pi(id*) begin var id* S end 

S::=S;S\ assume / | id" <-t n \id<- P t {t*)\Pi{t*)\ return {id + e) | goto t \ havoc id 

The local variables occurring in P are denoted by id, linear terms by t, Presburger 
formulae by /, and control labels by £. Each procedure consists in local declarations 
followed by a sequence of statements. Statements may carry a label. Program statements 
can be either assume statements^, (parallel) assignments, procedure calls (possibly with 
a return value), return to the caller (possibly with a value), non-deterministic jumps and 

3 Observe that there are no global variables in the definition of integer program. Those can be 
encoded as input and output variables to each procedure. 

4 Our simplified syntax does not seek to capture the generality of integer programs. Instead, our 
goal is to give a convenient notation for the programs given in this paper and only those. 

5 assume / is executable if and only if the current values of the variables satisfy /. 
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proc: P(x) 
begin 

var z; 

assume x 5s 0; 
goto then or else; 
assume x > 0; 
z<-P(x-l); 
z^z + 2; 
return z; 
assume x s£ 0; 
z^O; 
havoc x; 
return z; 
end 



..mil 
1\ 

®|x5s0ax' = xaz' =Z 
12 

x>0ax' =xaz'=z/^2)^\ j ^0ax' =xaz' = z 

13 16 

z' =P(x-l) ax 1 =x^© ©|z' = 0ax'=x 

14 11 

z! =z + 2ax' =x{© ©[z' = z 



Fig. 1. Example of a simplified imperative program and its integer program thereof 

havoc statement^]. We consider the usual syntactic requirements (used variables must 
be declared, jumps are well defined, no jumps outside procedures, etc.)- We do not 
define them, it suffices to know that all simplified programs in this paper comply with 
the requirements. A program using the simplified syntax can be easily translated into 
the formal syntax, as shown at Fig.Q] 

Example 1. FigureQ]shows a program in our simplified imperative language and its cor- 
responding integer program 2>. Formally, T = (P) where P is defined as: 

({x,z},(x),(,z), {q'"",q2,q3,q4, <76,<?7, l'f,<lf}, l'"" ' Wfrff}> {tlA&A, t 5 ,t 6 ,t 7 }}. 
Since P calls itself (t^), this program is recursive. ■ 

Semantics We are interested in computing the summary relation between the values of 
the input and output variables of a procedure. To this end, we give the semantics of a 
program T = (P\ ,P„) as a tuple of relations R q describing, for each non-final control 
state q e nfiTi), the effect of the program when started in q upon reaching a state in 
F{. An interprocedurally valid path is represented by a tagged word over an alphabet ©, 
which maps each internal transition t to a symbol T, and each call transition t to a pair 
of symbols (x,x) e ©. In the sequel, we denote by Q the variable corresponding to the 
control state q, and by x e © the alphabet symbol corresponding to the transition t of 
(P. Formally, we associate to T a visibly pushdown grammar, denoted in the rest of the 
paper by G<p = (X,&,5), such that: 

- QeXif andonlyif <7enjT(2>); 

- e^G'eSifandonlyiff: q ^> q ' and q' e nf(P); 

- Q -» x e 8 if and only iff. q^*q' and q' £ nf (fP); 

- Q -> <x Qf x> Q' e 5 if and only iff. q Z =P ' {U) > q >. 

6 havoc xi,X2, ■ ■ ■ ,x n assigns non deterministically chosen integers to x\ ,X2, ■ ■ ■ ,x„. 
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It is easily seen that interprocedurally valid paths in ¥ and tagged words in Gy are 
in one-to-one correspondence. In fact, each interprocedurally valid path of ¥ between 
state q e nf(Pi) and a state of F,, where 1 < i < n, corresponds exactly to one tagged 
word of Lq(G<p). 

Example 2. (continued from Ex.[TJ The visibly pushdown grammar corresponding 
to ¥ consists of the following variables and labelled productions: 

P\ ~ Qi Xi fi 2 def 

b def _ P A = Qa -» x 4 

= fi 2 -X2G3 ^ 

j, daf P 6 = 26 -» X 6 ft 

P 5 = & -> X 5 fie fl def 

P? = Qi -» x 7 

Lgin/r (G<p) includes the word w = XiX2(x3XiX5XgX7X3)x4, defining the nested word wjvw(w) ■ 
(X1X2X3T1T5T6T7X3T4, {3 ~> 8}). The word w corresponds to an interprocedurally valid 
path whereP calls itself once. Let yi = p\p\p\p%P\p\p\pq and 72 = p'lPiP'iP^^p'ePiPt 
be two control words we have fi',"" =^> w and fi',"" =^> w. ■ 



The semantics of a program is the union of the semantics of the nested words cor- 
responding to its executions, each of the latter being a relation over input and output 
variables. To define the semantics of a nested word, we first associate to each x e © an 
integer relation p x , defined as follows: 

- for an internal transition t : q — > q' e A,-, let p x = ^(x,,xj) £ Z x ' x Z X/ 

- for a call transition t : q — — ^> q' e A,-, we define a call relation p^ x = i~^'f = 

if) Q Z x <' x Z x ', a return relation p x \ = ("?' = Itf") c Z"-* x Z x ' and a frame 
relation (|) T = /\ xe% .\-^x' = x c Z x ' x Z x ' 

We define the semantics of the program fP = (Pi , . . . ,P„) in a top-down manner. As- 
suming a fixed ordering of the non-final states in the program, i.e. n'T(¥) = (qi,. . . ,qm), 
the semantics of the program ¥, denoted [fP], is the tuple of relations ([#ij, ■ ■ ■ i fo"D- 
For each non-final control state q e nf '(Pi) where 1 < i n, we denote by [g] c Z X; x 
Z X/ the relation (over the local variables of procedure P ( ) defined as [g] = Uasi e (Gj.) M- 

It remains to define [a], the semantics of the tagged word a. Because it is more con- 
venient, we define the semantics of its corresponding nested word w_«w(a) = (xi . ..x^,~+) 
over alphabet ©. For a nesting relation ~-» Q {1,...,^} x {1 , . . . ,£}, we denote by 
the relation {(s—(i—l),t — (i—l)) \ (s,t) e ~> n {/, . . . J} x {/, . . . ,j}}, for some 2, j e 
{1, . . .,£}, 2 < j. Finally, we define [(xi . c Z x < x Z x < (recall that a e L e (G.p) 

and g is a state of P ( ) as follows: 



(xi ...x^-^) 



Px, if ^ = 1 

p x , o |(x 2 . . . x e ,^ 2 ,e)l if £ > 1 and 1 ^> ./' for all 1 < j < I 
R x ° [(x ; +i . . .Ze,-^j + \,()l if I > 1 and 1 j for some 1 < j < 



where, in the last case, which corresponds to call transition t : q — — — U ^> q' e A,, we 
havexi = Zj = x and R T = (p <x [x 2 . . .X/_i,-^2j-i)] Px>) n T . 
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Example 3. (continued from Ex. [2]i Given the nested word 9 = (ziZ2^3^i^5^6' l 7' l 3' l 4i {3 ~> 8}) 
its semantics, [8], is a relation between valuations of {x,z}, given by: 

Px, Px 2 ° ((P<T 3 Px, ° Px 5 ° Px 6 ° Pl 7 ° Pl 3 >) H f3 ) O p T4 

One can verify that [6] = i = 1 az' = 2, i.e. the result of calling P with an input valua- 
tion x = 1 is the output valuation z = 2. ■ 

Finally, we introduce a few useful notations. By [fP] 9 we denote the component 
of pP] corresponding to g e n!f((P). Slightly abusing notations we define Lp^Gp) as 
L G «(Ga>) and [«P] fl as pPj^. Finally, define [«P]jf = {(/^O^*) I </,0>6 Mp,}- 



4 Underapproximating the Program Semantics 

In the section we define a family of underapproximations of pP] called bounded-index 
underapproximations. Then we show that each £-index underapproximation of the se- 
mantics of a (possibly recursive) program T coincides with the semantics of a non- 
recursive program computable from !P and k. The central notion of bounded-index 
derivation is introduced in the following followed by basic properties about them. 

Definition 1. Given a grammar G with relation ==> between strings, for every k ^ 1 

(*) M 
we define the subrelation ==> of =^> as follows: u v iff u ==> v and both u and v 

(*) 

contain at most k occurrences of variables. We denote by =^=>* the reflexive transitive 

M (k) ^ 

closure of^^>. Hence given X and k define L x (G) = {w e E* | X =^=>* w} anaf we 

call the derivation ofw e from X a fe-index derivation. A grammar G is said to have 
index k whenever Lx(G) = Dp (G) for each X e 

Lemma 1. For every grammar the following properties hold: (1) =^> c: ==> for all 

I IOC (*) {k) * * 

k > 1; (2) ==> = U^ = i ==>; (5) BC =^=> w e E iff there exist w\ , W2 smc/i f/iaf w = 

wiW2 anc/ either (/) B =^=>* wi, C =^=>* W2, or (;7) C =>* W2 and B =^=>* vvi. 

The main intuition behind our method is to filter out interprocedurally valid paths 
which can not be produced by £-index derivations. Our analysis is then carried out on 
the remaining paths produced by A:-index derivations only. We argue that this underap- 
proximation technique is more general than bounding the stack space of the program 
which corresponds to filter out derivations which are either non leftmos^oi not £-index. 



Example 4. (continued form Ex. |2]i P is a (non-tail) recursive procedure and Gy models 
its control flow. Inspecting Gy reveals that Lg«f (Gy) = { (%\%2{^) n (^3)^4)" 
n > 0}. For each value of n we give a 2-index derivation capturing the word: repeat 

7 Gruska 1171 proved that deciding whether Lx(G) = LejP (G) for some k ^ 1 is undecidable. 

8 A leftmost derivation is a derivation where, at each step, the production that is applied rewrites 
the leftmost nonterminal. 
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■ ■ P?P?P5 Pi 

n times the steps Qf" =^ TiT 2 <T 3 Qf T 3 >Q 4 =^ XiT2<T 3 0f T 3 >T4 followed by the 

steps Q'"" ^=> T1T5T6T7. Therefore the 2-index approximation of G-p shows that 

(2) 

Lrfnit {Grp) = LZ llit (dp). However bounding the number of times P calls itself up to 2 

results in 3 interprocedurally valid paths (for n = 0, 1,2). 

Givenfc^ 1, we define the k-index semantics of fP as [2>]^ = ([^l]^, . . . , [<?m]]®), 
where the £-index semantics of a non-final control state q of a procedure P, is the relation 

[,]« c= Z x < x Z x <, defined as [ 9 ] = U aEZ |) (Ga0 M- 

4.1 Computing Bounded-index Underapproximations 

In what follows, we define a source-to-source transformation that takes in input a recur- 
sive program P, an integer k > 1 and returns a non-recursive program J/* which has 
the same semantics as [fP]W (modulo projection on some variables). Therefore every 
off-the-shelf tool, that computes the summary semantics for a non-recursive program, 
can be used to compute the fc-index semantics of P, for any given k^\. 

Let T = (Pi,. . . ,P,t) be a program, and x = |Jf=i x i ^ e me se * °f a ^ variables in 
P. As we did previously, we assume a fixed ordering (qi,. . . ,q m ) on the set n!f(!P). 
Let Gj> = (X,0,8) be the visibly pushdown grammar associated with P, such that 
each non-final state q of P is associated a nonterminal Q e X. Then we define a non- 
recursive program 9{ that captures the /f-index semantics of P (Algorithm [T), for 
K > 1. Formally, we define 9{ K = X f = o(? Mer yg, i • • • :9 Merv g„,)' where: 

- for each k = 0, . . . , K and each control state q e re^F (fP), we have a procedure gi<eryg; 

- in particular, query^ , . . . ,queryg m consists of one assume false statement; 

- each procedure query k g has five sets of local variables, all of the same cardinality as 
x: two sets, named x/ and xo, are used as input variables, whereas the other three 
sets, named xj,xk and x^, are used locally by query q. Besides, query q has a local 
variable called PC. There are no output variables. 

Observe that each procedure queryg calls only procedures query g~ l for some Q', hence 
the program is non-recursive, and therefore amenable to summarization techniques 
that cannot handle recursion. Also the hierarchical structure of J-C K enables modu- 
lar summarization by computing the summaries ordered by increasing values of k = 
0,1,..., K. The summaries of 0~L K ^ are reused to compute 9{ K . Finally, it is routine 
to check that the size of 9{ (viz. the number of statements) is in 0(K ■ $Prod) where 
\Prod is the number of productions of Grp. Consequently the time needed to generate 
H K is linear in the product K ■ %Prod. 

Given that query q has two copies of x as input variables, and no output variables, 

the input output semantics [^J'^' k Z xxx is a set of tuples, rather than a (binary) 

cjusry q 

relation. Given two valuations 1,0 e Z x , we denote by / • O e Z xxx their concatenation. 

Thm. [T] relates the semantics of J{ K and the ^-index semantics of P. Given k, 
1 =g k < K and a control state q of P, we show equality between f^^J k and [^Pjg 

over common variables. Before starting, we fix an arbitrary value for K and require that 
each k is such that 1 k sg K. Hence, we drop K in 9( and write 9{. 
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Algorithm 1: proc query k Q(xi ,x ) for k > 1 

begin 

var PC,xj,x K ,x L ; 
PC^Q; 

goto p" or ■ ■ or p° or p* or ■ ■ or p* fc or pj or ■ ■ ■ or pJJ ; 
assume (PC = head(p"))\ assume P<atf(pf) (x/,xp); return; 

assume (PC = head(p"J); assume p tai i(pa )(xy,x ); return; 
assume (PC = head(p b [ )); [ paste code for case tail(p^) e x X ]; 



assume (PC = head(p b nb )); [ paste code for case tail(p h lh ) e&x X]; 
assume (PC = head(p\)); [ paste code for case tail(p\) e (© x X x ®) x X ]; 



assume (PC = head(p L n )); [ paste code for case tail(p c n ) e (© x X x 0) x X ]; 



end 



case: tail( P b ) =xQ' e®xX 

havoc (xy); 
assume p x (x/,xy); 
x/ «- x y ; 

PC^Q' ■ II 9 «ery*,(x/,Xo) 
goto Start; / / return 

In Alg.Q] />" where a e {a,b,c}ord: 
refers to a production of the 
visibly pushdown grammar Grp. 
The same symbol in boldface 
refers to the labelled statements . 

rod: 

in Alg. Q] The superscript a e 
{a,b,c} differentiate the pro- 
ductions whether they are the 
form Q — » x, Q —> iQ' or 
Q^(xQfx)Q', respectively. _ 



case: fax7(p? ) = <x gj' Y T)2'e{8xj:x9)xl 

havoc (xj,x K ,x L ); 
assume p^ x (x/,xy) ; 
assume p T \(xx,xy,) ; 
assume (|) t (xy,xy,) ; 
goto ord or rod; 



/* call relation */ 
/* return relation */ 
/* frame relation */ 



quer/ Q J,(xj,x K )\ 



xi < 
PC 



xl ; 
-2'; 

goto start; 

query k Q X (x L ,x ) 
x/ «- xy ; 
xo «- xa: ; 
PC «- Qf ; 

goto start; 



/* in order exec. */ 
1 1 return 

/* out of order exec. */ 



// quer/ ohll ,(x,,Xo) 
1 1 return 



One way to prove Thm.Q]consists in first unfolding the definitions of the semantics 



as follows: M * 

1L UqueryQ 



U. 



a 



in, 



Up EL w (Gp) 



then establish a 



relationship between the oc's and the p"s that implies the equivalence of their semantics 
over common variables. Instead, we follow an equivalent, but more intuitive, approach 
in which the semantics of 9-C is obtained by interpreting directly its code. After all, 
the interprocedurally valid paths in procedure query q are in one-to-one correspondence 
with the words of L quer ^ (G#). 

An inspection of the code of 9{ reveals that J{ simulates A>index depth first deriva- 
tions of Grp and interprets the statements of (P on its local variables while applying 
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derivation steps. By considering non necessarily leftmost derivations 9{ interprets the 
statements of fP in an order which differs from the expected one. 

Example 5. Let us consider an execution of query for the call query' L„ ; , (( 1 o),(i 2)) 

Si 

following Q\ n * P ^l i TiX2<T3ef"''t3>24 tiX2<T3et"''T3>X4 P,P = £f 1 TiT 2 <T3TiT5T 6 T 7 T3>T4. 

In the table below, the first row (labelled k/PC) gives the caller (1 = query l Q 4 , 2 = 

query 2 injl ) and the value of PC when control hits the labelled statement given at the 

Si 

second row (labelled ip). The third row (labelled xj/xo) represents the content of the 
two arrays. xi/xo = (a b)(c d) says that, in x/, x has value a and z has value b; in xo, x 
has value c and z has value d. 



k/PC 
ip 

x//x 


2/fif* 2/Q/f 2/Q2 2/Q2 2/Qi 2/Q3 2/Qi 
start start p!? start P3 rod 

(I0)(l2) (10)(12) (10)(12) (10)(12) (10)(12) (l 0)(l 2) (l<>)(l2) 


k/PC 
ip 

x//x 


1/24 1/24 2/gf 2/2i" ! ' 2/22 2/22 2/2 6 
start p;j start pj start p| start 

(l0)(l2) (l0)(l 2) (0 0)(42 0) (0 0)(42 0) (0 0)(42 0) (0 0)(42 0) (0 0)(42 0) 


k/PC 
ip 

xi/x 


2/26 2/2? 2/2 7 
Pe start p* 

(0 0)(42 0) (0 0)(42 0) (0 0)(42 0) 



The execution of query ini , starts on row 1, column 1 and proceeds until the call to 

Si 

query l Q A at row 2, column 1 (the out of order case). The latter ends at row 2, column 

2, where the execution of query 2 jni , resumes. Since the execution is out of order, and 

Si 

the previous havoc(x/,x^,X£) results into xj = (o o), xk = (42 o) and Xl = (l o) (this 
choice complies with the call relation), the values of xj/xo are updated to (oo)/(42o). 
The choice for equal values (0) of z in both x/ and xo is checked in row 3, column 3. ■ 

Theorem 1. Let T = (Pi, . . . ,P„) be a program and let q e nf (Pi) be a non-final control 
state of some P,- = (x,, "x*™ ,lt° ut ,Si,q'"" ,P/,A,-). Then, for any k^l,we have: 

Consequently, we also have: 

The proof of Thm. Q~|is based on the following lemma. 

Lemma 2. Let k > 1, q be a non-final control state of Pi and 1,0 £ Z x . If the call to 
query k Q (I,0) returns then </J x „OJ x ,> e . Conversely, //</| x ,,0| x ,> 6 \<P\f then 

there exists I' ,0' e Z x such that I' | x ,= /J. x ,, O' \ Kj = 0\ Xj and query^I' ,0') returns. 

Proof. First we consider a tail-recursive version of Algorithm[T]which is obtained by re- 
placing every two statements of the form PC <— X ; goto start ; by query % (x/, xq ) ; return ; 
(as it appears in the comments of Alg. [T}. The equivalence between Algorithm[T]and its 
tail-recursive variant is an easy exercise. 
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"<s=" Let (I ix v O l Xj ) e \!Pjq ■ By definition of £-index semantics, there exists a e 

Lq (G<p) such that (I[ Xn Glx ; ) e [a]. Let p\ be the first production used in the deriva- 
tion of a and let £ > 1 be the length (in number of productions used) of the derivation. 
Our proof proceeds by induction on I. If £ = 1 then we find that p \ must be of the form 
Q — » x and that a = x. Therefore we have [a] = [x] = p x and moreover (/| x , , CJ. X ,) e p x . 
Since k > 1, we let /' = / and O' = (9 we find that query q (/', O') returns by choosing to 
jump to the label corresponding to pi, then executing the assume statement and finally 
the return statement. When £ > 1, the proof divides in two parts. 

1. If pi is of the form Q — » iQ' then we find that a = x(3. Moreover, {I[ Xn O[ Xj ) e 
[aj = p x ° [P] by definition of the semantics. This implies that there exists J £ Z x such 

that (7| Xi ,-/|xi) e Px and (/|x;,G|x;) 6 [PI- Hence, we conclude from p e Lq) (Gj>), that 

(Jlxj , 0| Xj ) e [fPjg/ 1 . Applying the induction hypothesis on this last fact, we find that the 

call query g) (J, O) returns. Finally consider the call query <q(I,0) where at label start 
the jump goes to label corresponding to p\. At this point in the execution havoc(x/) 
returns J. Next assume p x (7,7) succeeds. Finally we find that the call to queryg(I,0) 
returns because so does the call query q, (J, O) which is followed by return. 

2. If pi is of the form Q ->• (zQ i f'z)Q' then we find that a = <xP'x>P for some p',p. 
LemmaQ](prop. 3) shows that either P' e L { ^ ] (G T ) and P e (G-p) or P' e (G T ) 

Qj u Qj 

and p e L [ ^ X) (Gj>). We will assume the former case, the latter being treated similarly. 

Moreover, (I[ Xn GJ, X/ ) e [a] = R x ° [p]. The leftmost relation can be rewritten, ^(p< x 

[PI Pt>) n ^■zj ° [P] which by definition of p, P' and the semantics is in included in 

^(p< x o [fP]^ 1 ' o p x ^) n(j) x ^ o [fP]g,'. We conclude from the previous relation that there 

exists J,K,LeI? such that </| Xi ,7| Xi > e p <x , (JU p Kl Xj )e {vf^, (Kl Xj ,Ll Xj ) e p >x , 

and {Ll Xn O[ Xi ) e l^Pj q ,. Applying the induction hypothesis we obtain that the calls 
query k ~} it (J ,K) and query g,(L,0) return. Given those facts, it is routine to check that 

query q(I' ,0') returns by choosing to jump to label corresponding to p\, then having 
havoc(xy,x/f,Xi) return (J,K,L) and we are done. 

The proof for the only if direction is in appendix lA.2l □ 

As a last point, we prove that the bounded-index sequence { [!P] W }^ satisfies sev- 
eral conditions that advocate for its use in program analysis, as an underapproximation 
sequence. The subset order and set union is extended to tuples of relations, point-wise. 

[<p] (*) c [<p] for all k > 1 (Al) 

{PI =UZiin (k) (A2) 
Condition (Al) requires that the sequence is monotonically increasing, the limit of this 
increasing sequence being the actual semantics of the program (A2). These conditions 
follow however immediately from the two first points of Lemma Q] To decide whether 
the limit \T\ has been reached by some iterate \ f P\ W , it is enough to check that the 
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tuple of relations in pP] W is inductive with respect to the statements of !P. This can be 
implemented as an SMT query. 

5 Completeness of Underapproximations for Bounded Programs 

In this section we define a class of recursive programs such that the precise summary 
semantics of each program in that class is effectively computable. We show for each 
program 2> in the class that (a) \T\ = \T\^ for some value of k > 1, and moreover 
(b) the semantics of 9{ k is effectively computable (and so is that of [fP]W by Thm.Q). 

Periodic Relations Given an integer relation ficZ" x Z", we define R° as the identity 
relation on Z", and R l+ 1 = R 1 oR ! for all i > 0. The closed form of R is a formula defining 
a relation ^cfJxZ'xZ" such that, for each n>0we have R(n) = R". In general, the 
closed form of a relation is not definable within decidable subsets of integer arithmetic, 
such as Presburger arithmetic. In this section we consider two classes of relations, called 
periodic, for which this is possible, namely octagonal relations, and finite monoid affine 
relations. The formal definitions are deferred to appendix IA. 31 

Bounded languages We define a bounded-expression b to be a regular expression of 
the form b = w* . . . wf, where k > 1 and each w, is a non-empty word. A language (not 
necessarily context-free) L over alphabet £ is said to be bounded if and only if L is 
included in (the language of) a bounded expression b. 

Theorem 2 ([21 1). Let G = (X,E,8) be a grammar, and X £ X be a nonterminal, such 
that Lx(G) is bounded. Then Lx(G) = L^' (G)for some k > 1. 

The class of programs for which our method is complete is defined below: 

Definition 2. Let T be a program and G<£ = (X,©,8) be its corresponding visibly 
pushdown grammar. Then fP is said to be bounded periodic if and only if: 

1. Lx(G<p) is bounded for each X e X; 

2. each relation p x occurring in the program, for some X 6 0, is periodic. 

Example 6. (continued from Ex.[4]i Recall that L ,„„ (G«p) = L^'„ (G<p) which equals to 
the set { (t x T 2 <X 3 ) " T i x 5 T 6 T 7 (x 3 >T 4 ) " n > 0} £ (xiX 2 (^3)*^*^*^* ^(^3)^4)* ■ 

Concerning condition 1, it is decidable lfl4l and previous work |[T6l defined a class 
of programs following a recursion scheme which ensures boundedness of the set of 
interprocedurally valid paths. Moreover, when condition 1 does not hold, one can still 
pick a bounded expression b and enforce boundedness by replacing G<p with grammar 
G' T , such that Lx{G'^) = Lx(Gcp) r\ b. Hence G' ¥ satisfies condition 1, although at the 
price of coverage, since interprocedurally valid paths not in b have been filtered out. 

This section shows that the underapproximation sequence { JfP]] W > defined in 
Section |4] when applied to any bounded periodic programs T, always yields [S 1 ]] in 
finitely many steps, and moreover each iterate [!P]|^ is computable and Presburger 
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definable. Furthermore the method can be applied as it is to bounded periodic programs, 
without prior knowledge of the bounded expression b 2 Lg(Gy). 

The proof goes as follows. Because fP is bounded periodic, Thm. [2] shows that the 
semantics [fP] of fP coincide with its fc-index semantics [fP] W for some £ 5= 1 . Hence, 
the result of Thm. Q] shows that for each q e nfF{fB), the £-index semantics [fPj^ is 
given by the semantics \^\ que ,yk^ of procedure query k Q of the program Then, because 

T is bounded periodic, we show in Thm. [3]that every procedure query q of program Of 
is flattable (Def.|3). Finally, since all transitions of Of are periodic and each procedure 
query q is flattable then \T\ is computable in finite time by existing tools, such as FAST 
[ 6 1 or FLATA 1 9 8 1 . In fact, these tools are guaranteed to terminate provided that (a) the 
input program is flattable; and (b) loops are labeled with periodic relations. 

Definition 3. Let T = (P\ , . . . ,P„) be a non- recursive program and G<p = (X, ©, 8) be 
its corresponding visibly pushdown grammar. Procedure Pi is said to be flattable 
and only if there exists a bounded and regular language R over ©, such that [fP]/>- = 

Notice that a flattable program is not necessarily bounded (Def.|2]i, but its semantics can 
be computed by looking only at a bounded subset of interprocedurally valid sequence 
of statements. 

Theorem 3. Let (P = (P\ , . . . ,P„) be a bounded periodic program, and let q e n 1 } '(fP). 
Then, for any k > 1, procedure queryg of program Of is flattable. 

The proof of Thm. |3]roughly goes as follows: recall that we have [PL = [P]^ for 

each q e niT(fP) and so it is sufficient to consider the set Lq (Gy) of interprocedurally 
valid paths. We further show (Thm. H} that a strict subset of the £-index derivations of 
Gfp is sufficient to capture iIq (G«p). Moreover this subset of derivations is characteri- 
zable by a bounded expression br over the productions of Grp. Then we use br to give 
a subset /(br) of the interprocedurally valid path of procedure queryg of Of that is 
sufficient to capture \P^\ 9ter ^- Finally, using existing results, we show (Thm. |5]l that 

/(br) is a bounded and regular set. Hence we conclude that each queryg is flattable. A 
full proof of Thm. [3]is given in appendix |A.6l 

Control Sets Given a grammar G = (X,E,8), we call any subset of 8* a control set. 

Let r be a control set, we denote by Lx (T, G) = {w el,* \ 3y e T : X => w}, the set of 
words resulting from derivations with control word in F. 

Depth-first Derivations are defined as expected: 

Definition 4 (|22|). Let D=X=wq =>* w„ =w be a derivation. Let k > 0, x, 6 £*, 
A; e X such that w m = xqAixi ■ ■ -A^x^; and for each i, 1 < i < k, let f m (i) denote the 
index of the first word in D in which the particular occurrence of variable A, appears. 
Let Aj be the nonterminal replaced in step w m =^> w m+ \ of D. Then D is said to be 
depth-first if and only if for all m, =S m < n we have f m (i) < f m (j), for all 1 ^ i < k. 
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We define the set DFx (G) (DF X (G)) of words produced using only depth-first deriva- 
tions (of index at most k) in G starting fromX. Clearly, DFx (G) Lx{G) and DF x k ^ (G) c 

if ) (G) for all k > 1. We further define the set DF X (T, G) (DF^ (T, G)) of words pro- 
duced using depth-first derivations (of index at most fc) with control words from T. 

The following theorem shows that Lg (G«p) is captured by a subset of depth-first 
derivations whose control words belong to some bounded expression. 

Theorem 4. Let G = (X, ©, 8) be a visibly pushdown grammar, Xq e X be a nontermi- 
nal such that Lx (G) is bounded. Then for each k > 1 there exists a bounded expression 
b r over 8 such that DF^ (br, G) = (G). 

Finally, to conclude that query k Q is flattable, we map the fc-index depth-first deriva- 
tions of G into the interprocedurally valid paths of query q. Then, applying Thm.[5]on 
that mapping, we conclude the existence of a bounded and regular set of interprocedu- 
rally valid paths of query q sufficient to capture its semantics. 

Theorem 5. Given two alphabets E and A, let f be a function from E* into A* such 
that (/) if u is a prefix of v then f{u) is a prefix of f(v); (ii) there exists an integer M 
such that \f(wa) \ — |/(w) sg M for all w e E* and a e E; (Hi) /(e) = £; (iv) (R) 
is regular for all regular languages R. Then f preserves regular sets. Furthermore, for 
each bounded expression b we have that /(b) is bounded. 



6 Experiments 



We have implemented the proposed method in the Flata 
verifier iTTSl and experimented with several benchmarks. 
First, we have considered several programs, taken from 
IH, that perform arithmetic and logical operations in a 
recursive way such as plus (addition), timesTwo (mul- 
tiplication by two), leq (comparison), and parity (parity 
checking). It is worth noting that these programs have fi- 
nite index and stabilization of the underapproximation se- 
quence is thus guaranteed. Our technique computes sum- 
maries by verifying that {Tf 1 ^ = [2>]( 3 ) for all these 
benchmarks, see Table Q] (the platform used for experi- 
ments is Intel® Core™2 Duo CPU P8700, 2.53GHz with 
4GB of RAM). 

M _J*-10 if jc 2s 101 ,._(x-l0 if x ^ 101 

~ | (/?„)«(*+ io. a -9) ifx^lOO b(x) ~ { G(G{x + b)) if x < 100 

Next, we have considered the generalized McCarthy 91 function [10|, a well-known 
verification benchmark that has long been a challenge. We have automatically com- 
puted precise summaries of its generalizations F a and G\, above for a = 2,..., 8 and 
b = 12, . . . , 14. The indices of the recursive programs implementing the F a ,Gb func- 
tions are not bounded, however the sequence reached the fixpoint after at most 4 steps. 



Program 


Time [s] 


k 


timesTwo 


0.7 


2 


leq 


0.7 


2 


parity 


0.8 


2 


plus 


3.4 


2 


F«=2 


3.7 


3 


F a =H 


45.1 


4 


Gfe=l2 


5.7 


3 


Gb=n 


19.1 


3 


Gb=u 


24.2 


3 



Table 1. Experiments. 
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A Missing material 



A.l Proof of Lemma Q] 

Proof. The proof of Properties Q] and [2] follow immediately from the definition of ==>. 

Let us now turn to the proof of Property [3] (only if). First we define w\ and W2- Take the 

(*) ' (k) 
derivation BC ==^* w and look at the last step. It must be of the form xYz=> xyz = w 

and one of the following must hold: either Y has been generated from B or from C. Sup- 
pose that Y stems from C (the other case is treated similarly). In this case, transitively 
remove from the derivation all the steps transforming the rightmost occurrence of C. 

(k) 

Hence we obtain a derivation BC ==►* w\C. Then W2 is the unique word satisfying 

(*) 

w = w\W2- Since BC ==^>* w\C, we find by removing the occurrence of C in rightmost 

(*-i) 

position at every step that B ==^>* w\ and we are done. Having Y stemming from B 
yields C =^=>* w%- For the proof of the other direction (if) assuming (2) (the other case 

(*) (k) 

is similar), it is easily seen that BC =^=>* w\C =^=>* w\W2- □ 



A.2 Proof of Lemma |2] only if direction 

Proof. Recall that in this proof we use the tail-recursive version of Algorithm [T] which 
is obtained by replacing every two statements of the form PC *— X ; goto start ; by 
query^(xj,xo) ;return; (as it appears in the comments of Alg.[T). 

"=>" Let I ■OeZ xxx such that the call to query q(I, O) returns, that is, with parameters / 
and O procedure query q has an execution that terminates with an empty call stack. We 

show that (Ilx n 0[ Xi ) e by induction on the number of times t ^ 1 a procedure 

of Of is invoked. If I = 1 then the only invocation is query q(I,0). So it is necessarily 
the case that, at the non-deterministic jump labelled start, the destination has the form 
p" for 1 < i n a . Further, label pf corresponds to a production of the form Q — * x of 
8, hence we find that x e Lq\Gp) since k ^ 1. Next, because the assume statement 
succeeds, we find that (/| Xi) OJ, Xj ) e p x , hence that (/| Xi -,0|x,) 6 M> next m at (/J. X/ 
,Oi %i ) e U aeL W (Gy) I«i and finally that </J x „OU> e by definition of ^-index 

semantics and we are done. 

If £ > 1, there are two possibilities for the first call to a procedure of Of following 
the call query q(I,0). 
- We are in the case tail(pf) =%Q' for some 1 i rij, and so query g(I, O) executes 

havoc(xy), assume p x (x/,xy), x/ <— xj, followed by query k Q,(x/,xo) then return. 
Lets us denote by / and J the content of x/ before and after the assignment. By 
induction hypothesis, we find that (J l Xj ,0 J, X[ ) e , hence that there exists 

a £ Lq) (Grp) such that (/| Xj ., OJ. x .) e [a]. Next p\ corresponds to a production of 
the form Q — » iQ' of 8, hence we find that xoc e Lq {Grp) since k > 1. Then, since 
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(ILnOl*,) = (HxiJUi) ° (JU n Ol Xi ) e p x o [a], we find that </| x ,,0| x ,> e [ux] 
by definition of the semantics, hence that (/J, X; ,6Ux ; ) £ (J (n [otj and finally 

' ' (xgLq (G<p) 

that </| x ,,0j x ,> e l^fq ] by definition and we are done. 

- We are in the case tail(p'j) = (lQ'j"'l)Q' for some 1 sg / sg n c . We further as- 
sume that the rod branch is executed (the ord being treated similarly). Therefore 
query k Q(I,0) executes havoc(xy,x/f,X£), assume p( x (x/,xy), assume p x y(xg,x£), 

assume (j) x (x/,xz,), x/ «— xj, xo *— xk, followed by the calls query k ^ 1 (x/_,xo), query k ^„ it (x/,x#) 

and then return. Call I,J,Ke Z x the values picked by havoc. 

Following the induction hypothesis, we find that (Li Xi ,Ol Xi ) £ [^J^/ and (Jl Xj 

,K[ X ) e [fP] ( ^. This implies that there exists a e L ( ^ l) (G. F ) and a' e L%(Gfp) 

such that (Z4x,,0|x,> e [ocj and (Jl Xj ,Kl Xj ) e [a']. Moreover, the definition of p\ 

and Lem.[T](prop. 3) shows that (xa'T)a 6 \}q (G?) 

Next, (H^JUj) e p <x , (JUj,KUj} e [«'!, frl^U) e p x> and </J x ,,Z4 x ,> e ^ 
shows that (I [ Xp L J, X; ) e /? x as given in the semantics. Again by the semantics, 
we find that </J x ,,OJ x ,> e [<xoc't>oc], hence that </J x ,,CU x ,> e U^^M. and 

finally that (/| X/ , OJ, X( ) e [iP]^ by definition of £-index semantics and we are done. 

□ 

A.3 Examples of Periodic Relations 

An octagonal relation is defined by a finite conjunction of constraints of the form 
+x + y ^ c, where x and y range over the set x u x', and c is an integer constant. The 
transitive closure of any octagonal relation has been shown to be Presburger definable 
and effectively computable ||8] . 

A linear affine relation is defined by a formula ^(x,x') = Cx ^ d a x' = Ax + b, 
where A e Z" x ", C e Z px " are matrices and b e Z", d e V. % is said to have the finite 
monoid property if and only if the set {A' i > 0} is finite. It is known that the finite 
monoid condition is decidable [7|, and moreover that the transitive closure of a finite 
monoid affine relation is Presburger definable and effectively computable II 12171 . 

A.4 Proof of Theorem[5] 

Definition5 ([14]). A generalized sequential machine, abbreviated gsm, is a 6-tuple 
S = (K,Y,,A,5,X,qi) where 

- K is a finite nonempty set (of states). 

- E is an alphabet (of inputs). 

- A is an alphabet (of outputs). 

- 8 ( the next-state function) is a mapping ofK x E into K. 

- X ( the output function) is a mapping ofK x E into A*. 

- q\ is a distinguished element ofK ( the start state). 

The functions 8 and X are extended by induction to K x E* by defining for every state 
q, every word xeL*, and every y in E 
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- 8(g,e) = q and X(q, e) = £. 

- 8(q,xy) = 8[8(q,x),y] and X(q,xy) = X(q,x)X[8(q,x),y]. 

It is readily seen that the second item holds for all words x andy in E*. 

Definition 6 (|14|).LefS = (/f,E,A,S,A.,gi) be a gsm. The operation defined by S(x) = 
X(q\,x) for each x e E* is called a gsm mapping. 

Theorem 6 (Theorem 3.4.1 (if direction), 1 14]). Let f be a function from E* into A* 
such that (z) / preserves prefixes, that is if u is a prefix of v then f{u) is a prefix of 
f(v); (ii) f has bounded outputs, that is, there exists an integer M such that \f(yva) \ — 
\f{w) | =S M for all w e E* and a e E; (Hi) /(e) = 8; (z'v) / _ 1 (R) is regular for all regular 
languages R. Then f is a gsm mapping. 

Theorem 7 (Theorem 3.3.2, 1 14 1). Each gsm mapping preserves regular sets. 
Lemma 3 (Lemma 5.5.3, lH4l ). S(w* . . . w* ) is bounded for each gsm S and all words 

Wl,...,W„. 

Finally, Theorem[5]is an easy consequence of the above facts. 
A.5 Proof of TheoremH 

The proof is long but technically not difficult. First, we need to introduce some new 
material. The Szilard language of a grammar G = (X,E,S) and denoted Szx(G) <= 8* 
is the set of control words used in the derivations of G starting with X e X. We denote 
by Sz'x(G) the set of control words used in the depth first derivations of G starting with 
X. Moreover let Szx(G,k) denote the set of control words used in depth first £-index 
derivations of G starting with X. Next, we recall a couple of know results H22I20I . 

Lemma 4 ((22)). For allied I, we have DF^p (G) = (G) and Szf(G, k) is regular. 

Given an alphabet E = {u\ , . . . , u^}, let Pk(ui) = ej be the ^-dimensional vector hav- 
ing 1 on the z-th position and everywhere else. We define Pk(e) = 0, Pk(ui 1 . . . u ln ) = 
^'j = iPk(uij) for any word u ly . . . ui n e E* and Pk(L) = {Pk(w) w e L} for any language 
icE*, The following result was proved in |[T3l : 

Theorem 8 (Thm. 1 from 1 13 1, also in [20 1). For every regular language L there exists 
a bounded expression br such that Pk(L n br) = Pk(L). 

Next we prove a result characterizing a subset of derivations sufficient to capture a 
bounded context-free language. 

Lemma 5. Let G = (X,E,8) be a grammar and X e X be a nonterminal, such that 
Lx (G) c a * . . . a ^ where a\, . . . , a f / are distinct symbols ofL. Then, for each k^\ there 

exists a bounded expression br over 5 such that DF^ k) (br, G) = L^ (G). 
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Proof: We first establish the claim that for each k > 1, there exists a bounded expression 
b r over 5 such that Pk(Szf(G,k) n b r ) = Pk{Szf{G,k)). By Lemmag] Szf{G,k) is a 
regular language, and by Theorem[H] there exists a bounded expression br over 8 such 
that Pk{Szf{G,k) n b r ) = Pk(Sz d / (G 7 k)) which proves the claim. Next we prove that 
DF«(br,G)=4 fc) (G). 

Let 8 = (X,- — > Vj)?L| be the sequence of productions of G, taken in some fixed order. 
For each right-hand side v, of a production in 8, let pk{vi) e Z c/ be the Parikh image of 
the subwordof obtained by taking the projection of V; aaa\, . . . ,0.4. Let II = [p£(v,)]"li 

be the m x d matrix whose rows are the pk(vi) vectors. Let X ==> w. Then we have 
Pk(w) = Pk(j) x n, and consequently, Pk(j\) = Pk(j2) implies that Pk(w\) = Pk(\V2) 

for any two derivations X =^> Wj of G, i = 1,2. Moreover, the assumption Lx(G) S 
a* . . . a* t where a\, . . . ,dd are distinct symbols shows that we further have w\ = w%. 

We prove that L$ (G) Q DF^p (br, G), the other direction being immediate. By Lemma@] 
we have L [k) (G) = DF^ k) (G). Let w e DF^ k) (G) be a word, and X =JU w be a depth- 
first derivation of w. Since Pk(Szx (G 7 k) n ^r) = Pk(S^f[ (G,k)), mere exists a control 

/f P 
word P e (G,k) n br such that =Pk(y), hence X =^> w' and w' = w as shown 

above. □ 
For the rest of this section, let G = (X, ©, 8) be a visibly pushdown grammar (we ig- 
nore for the time being the distinction between tagged and untagged alphabet symbols), 
and Xq e X be an arbitrarily chosen nonterminal, and let b = w* . . . w* t be a bounded 

expression, where w, = bf . ..bf e ©*, for every 1 < i < d. Let G b = (X b ,@,8 b ) be 
the regular grammar, where X b = \ qi- | 1 < s < d a 1 < r < jj [ and: 



(1) 
(2) 



It is routine to check that \J S=X L w (G b ) = w* . . . w* . Next, we define G M = (X M , 0, 8 M ): 

- X M = {X M } u { [qPxffl] \ X e X a q^ e X h a qf e X h a s < 









*\ | 1 < S < A 








V/ | 1 < s ^ s 1 


-1 






1 <5 <«j . 



- 8 M contains, for every 1 s < x n, a production X M — » Xbfj'j ], and: 



• for every production X — » x e 8 

_> x e 8" if *« _ x 9 W e 8 b (3) 

• for every production X — » x F e 8 

feW^W] - x fe (t) y 9 W] e 8- if «W - x 9 « e 8 b (4) 
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• for every production X — » x Z o F e 8 



if 9 W _ x ? W e 5b and ? 00 ^ a ? W e § b (5) 

The set 8 N contains no other productions. For each nonterminal [q^Xq^] e X M , we 
define ^([q^Xq^]) =X. Further ^(X N ) =Xq. This notation is extended to productions 
from 8 M , hence sequences of productions in the obvious way. Further we define £(r) = 
{^(d) d e T} where T is a control set over 8 N . Finally, for a derivation D M = X™ => 

[q^XoqW] w in G N , let ^(D M ) = X Q w be the derivation of G obtained by 
applying to each production /? in D M . 

Lemma 6. Let G = {X, ®, 8) be a visibly pushdown grammar, Xq e X be a nonterminal 
such that Lx (G) <= bfor a bounded expression b = w* . . . w%. Then for every k ^ 1, //ze 
following hold: 

1. L»(G)=L«(G><) 

2. Given a confroZ ser T over 8 M smc« f/zaf £>F ( S(r,G N ) = L ( ^(G M ) 
f/zen f/ze control set V = £(r) over 8 satisfies DF^ (F, G) = L$ (G). 

Proof, (sketch) The proof of point 1 is by induction. So we actually show the following 

(s) (u) w 

stronger statement. Let k 5= 1 and let w e £*. We show that Iqf'Xqi"'] ==>* w iff 

(.1) (u) W 

<7r =>* vf^v an d X ==> * w - The proof for the if direction is by induction on the 

G 

w 

length of X =^=>* w. 

(*) (s) 
"z = 1" Then X ==>* w for some production X — » x of 8 with w = x. Also ov — ► 

G 

Xq r i"' ) in 8 b and so by definition of G M we have [q^Xq^] — » x in 8 M and we are done. 

(*) 

"z > 1" We do a case analysis according to the tail of the first production in X ==>* 

w. 

(k) (*) ( s ) 

- X =^> xX' ==>* iw' = w which implies that X —* xX' is in 8. Further, Or =^>* 

G G 

wqi"^ shows that there exists q^ => xq^ ' =>* xw'g£"', hence that ^ — » iq^ 
is in 8 b , and finally find that [q^Xqf*] — x [q^X'qi^] belongs to 8 N . Also we 

conclude form the hypothesis that X' ==>* w' and q^, =>* w' qf and so, by 

G 

(«') M ^ 

induction hypothesis, we find that fa^ 'X'^ w' and we are done. 

r g m 

W (*) 

- X =^> XX0X2 ==>* xwi OW2 = w and so there exists X — » 1X10X2 in 8. More- 

G G 

over, since q\ r ^ =>* w#v"' we find that there exist q^ => xq^ =^>* xw\ q^ ' => 
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XW\ oqc =^>* TwiO^J"'. Hence, the definition of G M shows that [q, Xqi" ] — * 

%\qa^Xiqt ']a[qc d ^X2qv ]. On the other hand, since X1X2 =^=>* wi W2 (simply 

(t-i) (*) 
delete x and a), Lemma [T] shows that either X\ =^=>* w\ and X2 =^=>* W2; or 

G G 

(*) 

=^=>* vvi and X2 =^=>* W5. Let us assume the latter holds (the other being 

G G 

(b) lb') ^ 

treated similarly). Applying the induction hypothesis, we find that [<7„ X\q , 1 =^=>* 
w\ and X2q\ ] =^=>* W2, hence we conclude the case with the fc-index deriva- 
tion [qPXqW] ^^[q^q^Mq^q^] — *t[,«Xi,JV* =^ 

(JM QtX G 

X Wl OW2. 

To conclude the "if" case, observe that L«|* (G) c w* . . . w* implies that for every 
w e Lj? (G) we have .Xo =^=>* w and also =>* wq[ s \ hence that w e (G). 

(.0 («) w 

Using a similar induction on the length of derivation \q, Xqi 1 =^=>* w, the "only 

if" direction is easily proved. 

For the proof of point 2. the "c" is obvious by definition of depth-first derivation. 
For the reverse direction "2" point 1 shows that L$ (G) = (G M ), hence using the 

assumption we find that DF (k J, (T, G M ) = L$ (G). So let D = X M w be a depth- 

first fc-index derivation of G M with control word conforming to Y. Now consider Z,(D), 
it defines again a depth-first fc-index derivation. Further, the definition of Z, shows that 
the word generated by Z,(D) is w. □ 
Let 2L = {flj, . . . ,04} be an alphabet disjoint from ©, and a language homomorphism 
/1 : — > ©*, defined as /i(a,) = w,-, for all 1 i < a!. We now obtain from G M a grammar 
G', over J?, such that L Xq (G n ) = h(L Xo (G a )). Define G a = (X M , J?, 8"), as the result of 
applying onto G M the following transformation on every p e S 1 * 1 : if p was defined using 
a production qr — > y e 8 P where r = j s then replace the corresponding occurrence 
of y in p by a s , else (r # y' s ) replace the corresponding occurrence of yby e. In this way 
we can map the productions of G M onto productions of G" . This mapping is extended to 
the derivations of G M . The information kept within the nonterminal of X™ is sufficient 
to also define the reverse mapping, from the productions (derivations) of G" back to 
the productions (derivations) of G M . We define the mapping V : 8° —* 8 M as follows, for 
a, be Avj {e}: 

- v(feWx 9 W] - a) = [qPXqV] - ft« 

- V([^W] [q^Yq^]) = [qPxqM]^ [q^Yq[ x) ] 

- v([qi s) XqM]^a[qi z) Yqi v) ] b [qfZq^]) = fe (z) I^ v) ] ft£ v) Z<7 

Lemma 7. Lef G = (X, 0, 8) fee a visibly pushdown grammar, Xo e X be a nonterminal, 
andw* ...Wj be a bounded expression over ©. Also let SI = {a 1, ... ,a^} be an alphabet 
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disjoint from 0, and h: — > ©* be the homomorphism defined as h{a{) = w,-, for all 
1 ^ / ^ d. Then for every k > 1, the following hold: 

1. L«(G")=/z- 1 (^(G M ))nflf---fl* 

2. given a control setT" over h a such that DF^iT ,G a ) = L^(G J ), then the control 



setV = v(r a ) over8 M satisfies DF {k ^{Y' ,G M ) = L {k l{G M ) 



A o 

Proof, (sketch) The proof of Point 1 is by induction showing the following stronger 
statement: let we©* then we have [o^ W h) 1 =i=L* w iff [o^Xo! 10 ! /z" 1 (w) n 

a* • • The proof is done by induction on the length of the derivations similarly to 

Lemma [6] It follows that L {k l(G a ) = K' \ w[ l £ L ( ^(G M )}, hence that 

z o x o 

h(L {k) M (G"))= L {k l (G M ) by definition of h and since L {k) M (G M ) c w* • • • w*. For point 

2, applying h on both side of the assumption to obtain h(DF^\\ (T a ,G a )) = (G a )), 

x o x o 

hence 

h{DF {k l (T , G") ) = (G M ) by point 1 . To conclude the proof, it is sufficient to show 

^0 ^0 

that h(DF ik l(r\G a )) = DF ( * ) (v(P'),G M ). Again an induction proof is called for. □ 

x o X (l__^ 

Before proving Theorem |4] we recall the following result about homomorphisms 
and bounded languages. Let g : E — > be a homomorphism that maps each symbol of 
E into a word over SI, and L c w* . . . where w* • • • vv f * is a bounded expression. Then 
g(L) is also bounded^ 

Finally the actual proof the Theorem|4]goes as follows. 
Proof. Let R = {«i, . . . ,a</} be an alphabet disjoint from 0, and let h : 2L — > ©* be the 
language homomorphism defined by h(at) = Wj, for all 1 sg i sc: d. By applying Lemma[6] 

(first point), and then Lemma (first point) we find that lj$ (G) = h(L (k l (G a )). Next, 

o x 

applying Lemma [5] on L^(G°) we obtain a bounded expression bps over 8" such 
that DF\l (br n , G") = L ^(G a ). Our next step is to apply the results of Lemma [7] 

(second point), and Lemma |6] (second point) in that order to obtain that lJ$ (G) = 

DF^ (t, (v (br° ) ) , G) . Finally, since br« is a bounded expression, and \ and v are homo- 
morphisms (and so is the composition ^»v)we have that ^(v(bpO) is bounded, hence 
included in a bounded expression and we are done. □ 



A.6 Proof of Theorem|3] 

Lemma 8. Let G = {X, ©, 8) be a visibly pushdown grammar such that for all produc- 
tions p 6 8 all nonterminals occurring in tail(p) are distinct. LetX e X and ye 8*, then 
there exists at most one depth-first derivation of G with control word J, hence at most 
one word resulting from it. 

9 Alternatively, it can also be shown using Theorem [5] 



22 



Proof: By contradiction, suppose there exist two depth-first derivations from X with 

Pi 

control word p\...p n . This means that there exists a f, 1 < f < n such that X = wq =^ 

Pi 

w\ ■ ■ ■ Wi-i =^> Wi and w; contains two occurrences of the nonterminal head(pi), that is 
Wi = OCA1PA2Y where A\ = A 2 = head(pi) and a, p,y e (E u ©*). Two cases arises: 

1 . A 1 and A2 result from the occurrence of some pj with j < i which contradicts that 
all nonterminals occurring in tail(pj) are distinct. 

2. A 1 and A2 result from the occurrence of and pi with k ^ I respectively. Following 
the definition of depth-first derivation p, must be applied to A\ if k > I; and to A2 
if k < I. In either case pi can be applied to only one of the two occurrences which 
contradicts the existence of two depth-first derivations. 

□ 

Note that because the grammars of this paper stems from programs we can then 
assume without loss of generality that the condition on tail(p) for every production p 
holds for every grammar in this paper. 

Finally the proof the Thm.[3]goes as follows: 
Proof, (sketch) Since T is bounded periodic we can apply Theorem [4] showing there 
exists a bounded expression br over 8 such that DFg ^ (br, G<p) = L^q (Gy). Hence we 

find that M f =U aeL « (Gy) M =U ae< ) (br , G , ) W. 

Let a e DFg\br,G'p) and let y be the control word of the derivation Dy thereof 
which is unique by Lemma [8] We then prove that Dy corresponds to a unique inter- 
procedurally valid path P of queryg, that is p e L query k (G^). This is however easily 

seen looking at the code of query k g whose control flow follows precisely a depth-first 
£-index derivation. Because y the control word over 8 determines uniquely Dy hence p 
we conclude that there exists a function / that associates each word over © a unique 
word A (call A the alphabet of G^). Moreover define /(e) = £. Basically, / maps each 
production p of Dy to a labelled statement p in 9f. Moreover between two consecutive 
labelled statements p and p' / is stuffing a sequence of statements of Of which is unique 
for the reason that Dy is unique. 

Next, we show that /(br) is a bounded regular set over A using Thm.[5] To this end, 
we need to show that / satisfies the properties (0 to (iv) in Thm. [5] Following the pre- 
vious explanations on / that is stuffing sequences of statements between consecutive 
productions it is seen that (z) holds, also (if) holds because the number of statements 
added between any two consecutive productions is bounded, (fff) holds by definition 
and finally (iv) holds because /~' consists in deleting statements not referring to pro- 
ductions which clearly preserves regularity. 

We then conclude from Thm. [5] that /(br) is a bounded and regular language. Back 

to l#V^> we findthat 

and that l^Q men k is flattable since /(br) is a bounded regular set. □ 
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